The Romanian POS Tagger automatically performs four tasks - sentence splitting, tokenizing, pos tagging and lemmatizing. It requiresa language model for the pos tagger, a language dictionar and a set or rules, used in the redactor rules system.
The Romanian NP Chunker, uses GGS (Graphical Grammar Studio http://sourceforge.net/projects/ggs/), a visual tool for describing grammars. A Romanian grammar has been developed allowing fully recursive NP chunks.
The Romanian NE recognizer primarily uses the JRC-Names. A secondary NER, based on an ANNIE GATE application, has been customized for Romanian. The GATE application, wrapped as UIMA primitive engine, additionally provides dates, locations, money and percentage NEs.
ATLAS (Applied Technology for Language-Aided CMS) is a project funded by the European Commission under the CIP ICT Policy Support Programme.